fix: adapt SDK storage layer to crawlee v4 StorageClient interface#595
fix: adapt SDK storage layer to crawlee v4 StorageClient interface#595B4nan wants to merge 18 commits into
Conversation
fba9532 to
777f5d6
Compare
barjin
left a comment
There was a problem hiding this comment.
A few ideas, but lgtm overall. Thanks!
@janbuchar might have more ideas as the original author
| const id = | ||
| options?.id ?? | ||
| (options?.name | ||
| ? (await this.client.datasets().getOrCreate(options.name)).id | ||
| : undefined); |
There was a problem hiding this comment.
Note that if we implement storageExists, Crawlee should be able to resolve the identifiers on its own(?)
| * implementation (which produces a `file://` URL or returns `undefined`). | ||
| */ | ||
| override getPublicUrl(key: string): string { | ||
| override async getPublicUrl(key: string): Promise<string | undefined> { |
There was a problem hiding this comment.
Maybe this change closes this issue - I'm not sure if the renaming was a requirement, or a way to make this BC.
| // `client` is `private` on `CoreKeyValueStore`; bypass the visibility | ||
| // check to fetch the per-store secret. There is no public crawlee API | ||
| // surface for this yet — track upstream exposure as a follow-up. |
There was a problem hiding this comment.
Good point, please create the issue in Crawlee (I see that Dataset and RequestProvider already both have client public, so it's weirdly asymmetrical now).
|
@B4nan did you check parity with apify-sdk-python? I assume that there will be gaps in the request queue implementation, which is fine at this point. |
f5d0e03 to
94e8aa4
Compare
f271efa to
83afcc1
Compare
94e8aa4 to
c78bc3d
Compare
83afcc1 to
5bd8f93
Compare
f8aa684 to
076c1eb
Compare
5bd8f93 to
e56ccdf
Compare
076c1eb to
cc2c4dd
Compare
10b52d7 to
cf11503
Compare
89d0f5c to
b4ea1e5
Compare
cf11503 to
ef84e4e
Compare
b4ea1e5 to
fd14806
Compare
ef84e4e to
49e27d2
Compare
fd14806 to
c1f975a
Compare
49e27d2 to
036a6a0
Compare
c1f975a to
79c7560
Compare
9067d2f to
2083fb2
Compare
dec3f6c to
43cf6a4
Compare
2083fb2 to
4001867
Compare
…uration crawlee 4.0.0-beta.56 ships `Configuration.reset()` (apify/crawlee#3649), so the SDK's override can delegate to `super.reset()` instead of calling `serviceLocator.reset()` directly. The SDK still owns clearing its own `globalConfig` static and replacing the `AsyncLocalStorage` singleton. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Crawlee v4's `EventManager` constructor now requires `EventManagerOptions` (just `persistStateIntervalMillis`), and the base class no longer carries a `config` field — the previous `override readonly config` pattern is no longer valid. - Drop the `override` and store `config` as own readonly property. - Forward `persistStateIntervalMillis` to `super()`. - Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()` so the SDK plays nicely with the new ServiceLocator-driven init path. Stacked on #583 (config redesign); rebases onto v4 once that lands.
`Configuration.useEventManager()` was removed in crawlee v4. Install the platform event manager via the global service locator instead, and reset between tests so each case can register a fresh manager without hitting `ServiceConflictError`.
Crawlee v4's `Configuration` is eager — `actorEventsWsUrl` is read once at construction, so a global config that pre-existed the `beforeEach` would never see the websocket URL we set, and `events.init()` would silently never connect. Move the env-var setup above `Configuration.getGlobalConfig()` and reset the SDK's static singleton so each test rebuilds a fresh config.
Crawlee v4 reshaped its `StorageClient` interface (async factory methods that accept `id` *or* `name`), removed the cached `storageObject` from `KeyValueStore`, and made `getPublicUrl` async. The existing SDK code targeted the v3 shape and no longer compiles. Changes: - New `ApifyStorageClient` adapter wraps `apify-client`'s legacy `dataset()/keyValueStore()/requestQueue()` accessors and exposes the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient` factories crawlee now expects. Names are resolved to IDs via the collection `getOrCreate(name)` calls. apify-client's resource clients don't yet implement v4-only members like `getMetadata` / `getRecordPublicUrl`; the adapter casts through with a TODO comment so the structural alignment can land separately upstream. - `Actor.init` and `_openStorage` now wrap `this.apifyClient` in `ApifyStorageClient` before handing it to crawlee. - `KeyValueStore.getPublicUrl` is now async; the per-store `urlSigningSecretKey` is fetched on demand via the (private) `client.getMetadata()` instead of the removed `storageObject` cache. URL-signing behaviour for platform-mode reads is preserved. - `Actor.openRequestQueue` reads `totalRequestCount` via the new `client.getMetadata()` (the old `client.get()` was dropped). - `StorageManager.openStorage` is now `(class, id?, client?)` — removed the trailing `this.config` argument. Stacked on #583 (config redesign); rebases onto v4 once that lands.
Replace the removed `StorageManager.clearCache()` and `Configuration.useStorageClient()` with `serviceLocator.reset()` plus `serviceLocator.setStorageClient()`.
…n emulator init/destroy
…apter - `openRequestQueue should open storage`: mock client uses `getMetadata()` (the v3 `get()` was dropped on RequestQueueClient). - Both Storage API tests assert that StorageManager.openStorage is called with an ApifyStorageClient (matched structurally) instead of the raw ApifyClient — the SDK now wraps it for crawlee v4.
Actor.init() calls Configuration.storage.enterWith(this.config), which sticks the resolved config onto the current async context and persists across tests on Node 22 (but not Node 24+). The cached value short- circuits Configuration.getGlobalConfig() so subsequent tests never see the env vars they just set. Reset the AsyncLocalStorage value alongside the other singletons in the test emulator so addWebhook (and friends) see ACTOR_RUN_ID etc.
…eset `Actor.init()` calls `Configuration.storage.enterWith(this.config)`, which sets the AsyncLocalStorage value on whichever async context the test runner happened to be on. `enterWith(undefined)` from a child async branch (vitest's beforeEach) doesn't unwind that — on Node 22 the test body re-enters a sibling context where the original `enterWith` is still in effect, so `getStore()` still returns the stale Configuration even after our reset. Swapping the entire `AsyncLocalStorage` instance for a fresh one guarantees `getStore()` returns `undefined` for every async branch that follows, fixing the addWebhook test failures on Node 22.
…f inline boilerplate
4001867 to
112d8f2
Compare
beta.56 (apify/crawlee#3584) renamed `StorageManager` → `StorageInstanceManager` and reshaped the public storage open path. The static `StorageManager.openStorage(cls, id, client)` helper is gone; each storage class now exposes `static open(id, options?)` with a `storageClient` option for routing through a custom backend. - `actor.ts`: `_openStorage` now calls `storageClass.open(id, { storageClient })` instead of `StorageManager.openStorage(...)`. `StorageOpenOptions` replaces `StorageManagerOptions`. - `key_value_store.ts`: import `StorageOpenOptions` for the `open()` override signature. - `actor.test.ts`: the `openDataset` / `openRequestQueue` `forceCloud` tests now spy on the storage class's own `open()` (no more `StorageManager.prototype`), and assert the `storageClient` lives one level deeper in the options object. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
112d8f2 to
5ea0f5e
Compare
Summary
Crawlee v4 reshaped the
StorageClientinterface, removed the cachedstorageObjectfromKeyValueStore, and madegetPublicUrlasync. The SDK still targeted the v3 shape and no longer compiled. This PR adapts:ApifyStorageClientadapter wrappingapify-client's legacydataset()/keyValueStore()/requestQueue()accessors and exposing the factory methods (createDatasetClient/createKeyValueStoreClient/createRequestQueueClient) crawlee now requires. Name → ID resolution goes through each collection'sgetOrCreate(name).Actor.initand_openStoragenow wrapthis.apifyClientin the adapter before handing it to crawlee/StorageManager.KeyValueStore.getPublicUrlis nowasync; the per-storeurlSigningSecretKeyis fetched on demand via the (private)client.getMetadata()rather than the removedstorageObjectcache. URL-signing behaviour in platform mode is preserved.Actor.openRequestQueuereadstotalRequestCountviaclient.getMetadata()(the oldclient.get()is gone).StorageManager.openStoragesignature is now(class, id?, client?)— dropped the trailingthis.configargument.Known gaps tracked in this PR
apify-client'sDatasetClient/KeyValueStoreClient/RequestQueueClientdon't yet implement v4-added members likegetMetadataandgetRecordPublicUrl. The adapter casts through withas unknown as ...for now — the proper fix is to bringapify-client's resource client shapes into structural alignment with@crawlee/types, which is out of scope here.The
KeyValueStoreInfotype in@crawlee/typesalso doesn't declareurlSigningSecretKey, so the SDK uses a local intersection type to access it. Worth surfacing upstream eventually.Stacking
Depends on #583 (config redesign). Rebases cleanly onto v4 once that lands.